The THIS-NPs Hypothesis: A Corpus-Based Investigation
نویسندگان
چکیده
We report on an analysis of the use of THIS-NPs, i.e., noun phrases with the determiner this and the demonstrative pronouns this and these. We test the THIS-NP hypothesis, a refinement and clarification of earlier proposals, such as (Linde, 1979; Gundel, Hedberg, and Zacharski, 1993; Passonneau, 1993), by way of a systematic analysis of the uses of these NPs in two different genres. In order to do this, we devised a reliable annotation scheme for classifying THIS-NPs in our corpus as active or not, in the sense of the hypothesis. 92% of THIS-NPs in our corpus were classified as referring to entities which are active in this sense. We tested three formalizations of the THIS-NP hypothesis. The version that received most empirical support is the following: THIS-NPs are used to refer to entities which are active but not the backward-looking center of the previous utterance. 1 The THIS-NPs Hypothesis In formal semantics / pragmatics, noun phrases with the determiner this and the demonstrative pronouns this and these (THIS-NPs henceforth)1 have mostly been studied for their deictic function—to refer to objects in the visual situation, and particularly to objects the speaker is pointing at (Kaplan, 1979; Jarvella and Klein, 1982; André, Poesio, and Rieser, 1999). (1) A [pointing to his house]: I have lived in this house for twenty years. It is, however, well-known that THIS-NPs can be used in other ways as well; and indeed, preliminary analyses of the corpus used in this study (discussed below) suggested that only about 39% of THIS-NPs were cases of visual deixis (Poesio, 2000). A second function of ‘demonstrative’ NPs was identified by authors such as Linde (1979), Gundel, Hedberg, and Zacharski (1993), and Passonneau (1993). These authors pointed out that pronominal THISNPs in particular2 are often used to refer to a discourse entity other than the current discourse focus: 1We will mostly avoid the use of the term ’demonstrative’ as the starting point of this research is the realization that not all these uses are ’demonstrative’ in Kaplan’s sense (Kaplan, 1979). We are concentrating on THIS-NPs because our corpus contains very few cases of that noun phrases. 2Passonneau studied the use of that rather than this. (2) Dilbert arrived to work. He saw one of his colleagues. As he was trying to avoid this person, he quickly ducked into his cubicle. It is also known from work by, among others, Asher (1993) and Webber (1991) that THIS-NPs can be used to refer to abstract objects such as propositions or plans (Webber used the term DISCOURSE DEIXIS for these cases) as in the following example: (3) For example, binocular stereo fusion is known to take place in a specific area of the cortex near the back of the head. Patients with damage to this area of the cortex have visual handicaps but they show no obvious impairment in their ability to think. This suggests that stereo fusion is not necessary for thought. (Webber, 1991) What the discourse and visual deixis cases, and the cases studied by Linde and Passonneau, have in common is that in all cases, the THIS-NP is used to refer to an entity which, while salient, is not the current ‘topic’ or ‘discourse focus’ (we are deliberately using these terms in a vague way here). This intuition was captured by Gundel, Hedberg, and Zacharski (1993), who developed a theory of the conditions under which referring expressions are used based on the notion of ACTIVATION HIERARCHY: a speaker’s choice of expression depends on assumptions about the ‘cognitive status’ of the referent in the hearer’s information state. Gundel et al.’s ‘activation levels’ range from TYPE IDENTIFIABILITY for indefinite NPs, to IN FOCUS for pronouns. Gundel et al. propose that the use of THIS-NPs, as well as of pronoun that3 requires the referent to be ACTIVATED, i.e., to be represented in current short-term memory.4 We believe these proposals can be made at the same time more broad in their coverage and more precise by (i) specifying which entities are supposed to be ’in focus’ and (ii) by being more explicit about the types of entities that can be ’in short term memory’ without being ’in focus’. Our goal in this paper is to refine, clarify 3But not of full that NPs, which only require the referent to have the lower ‘familiar’ status. 4In fact, for THIS-NPs, Gundel et al. claim that the referent has to be speaker-activated—introduced by the speaker. and test the ideas just discussed, summarized as follows: The THIS-NP Hypothesis : THIS-NPs are used to refer to entities which are ACTIVE but not IN FOCUS. Notice that two notions to be made more precise are: what it means for an entity to be ’in focus’ and what it means for it to be ’active’. We’ll consider each below. 2 Background: Our previous corpus analysis work Recent years has seen an increasing interest in corpora as a means to explore linguistic generalizations, and a correspondingly increased sophistication in the methods used. This includes better techniques for storing and annotating language corpora, based on annotation standards such as XML. It also includes techniques for measuring the RELIABILITY of a given annotation scheme (Passonneau and Litman, 1993; Carletta, 1996). One of the major motivations for this work is that we felt that we could improve upon previous analyses of the uses of THIS-NPs by building on the results of our own previous corpus analyses of the uses of referring expressions in general and of salience (Poesio et al., 2000; Poesio, 2000). As a result of this work we had at our disposal the GNOME corpus (further discussed below) whose NPs, the anaphoric relations between them, and their visual deixis status, had been marked in a reliable way (Poesio, 2000). Secondly, we have developed methods for computing the BACKWARD-LOOKING CENTER, or CB (Grosz, Joshi, and Weinstein, 1995; Walker, Joshi, and Prince, 1998)—a well-known formalization of the notion of ’local focus’—automatically, instead of relying on hand-identification, which is notoriously problematic; and according to several definitions proposed in the literature, among which we were able to find the ‘best’ (i.e., those which resulted in fewer violations of the claims of Centering theory) (Poesio et al., 2000). These two previous pieces of work allowed us a more systematic exploration of the conditions under which the use of a THIS-NP was licensed, as discussed below. 2.1 Annotation Scheme Our annotation followed a fairly systematic manual, available from the GNOME project’s home page at http://www.hcrc.ed.ac.uk/ gnome; here, we discuss the most important details of the scheme. All units of text in the GNOME corpus that might be identified with utterances (in the Centering sense) are marked as unit elements; the attributes of such elements allow us to identify finite and non finite clauses, etc. Each NP is marked with a ne tag and with a variety of attributes capturing syntactic and semantic properties. Important attributes for our purposes are cat (specifying the type of an NP), gf specifying its grammatical function, deix (whether the object is a visual deictic reference or not) and generic (whether the NP denotes generically or not). A separate ante element is used to mark anaphoric relations; the ante element itself specifies the index of the anaphoric expression and the type of semantic relation (e.g., identity), whereas one or more embedded anchor elements indicate possible antecedents (the presence of more than one anchor element indicates that the anaphoric expression is ambiguous). (See 4.) (4) The drawing of the corner cupboard, or more
منابع مشابه
Animacy effects on discourse prominence in Greek complex NPs
This paper is concerned with the factors determining the relative salience of entities evoked in Complex NPs. The salience of entities evoked in complex NPs cannot be predicted by current theories of salience which attribute salience to grammatical role (subjects are more salient than non-subjects) or thematic role (agent are more salient than non-agents). A plausible hypothesis might be that, ...
متن کاملA Functional Investigation of Self-mention in Soft Science Master Theses
This study is a quantitative and functional corpus-based study of self-mention in soft science Master theses. One important purpose of this study was to find out the functions of self-mention in soft science Master theses. For this purpose, 20 soft science Master theses in four disciplines (Applied linguistics, Psychology, Geography, and Political sciences), were randomly selected out of the li...
متن کاملA Corpus-Based Approach to Topic in Danish dialog
We report on an investigation of the pragmatic category of topic in Danish dialog and its correlation to surface features of NPs. Using a corpus of 444 utterances, we trained a decision tree system on 16 features. The system achieved nearhuman performance with success rates of 84–89% and F1-scores of 0.63–0.72 in 10fold cross validation tests (human performance: 89% and 0.78). The most importan...
متن کاملDeveloping a Corpus-Based Word List in Pharmacy Research Articles: A Focus on Academic Culture
The present corpus-based lexical study reports the development of a Pharmacy Academic Word List (PAWL); a list of the most frequent words from a corpus of 3,458,445 tokens made up of 800 most recent pharmacy texts including research articles, review articles, and short communications in four sub-disciplines of pharmacy. WordSmith (Scott, 2017) and AntWordProfiler (Anthony, 2014) were used to sc...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002